Finite-Sample Analysis of Two-Time-Scale Natural Actor–Critic Algorithm

نویسندگان

چکیده

Actor–critic style two-time-scale algorithms are one of the most popular methods in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this article, we characterize global convergence an online natural actor–critic algorithm tabular setting using a single trajectory samples. Our analysis applies to very general settings, as only assume ergodicity underlying Markov decision process. order ensure enough exploration, employ $\epsilon$ -greedy sampling trajectory. For fixed small exploration parameter , show that has rate notation="LaTeX">$\tilde{\mathcal {O}}(1/T^{1/4})$ where notation="LaTeX">$T$ number samples, leads sample complexity {O}}(1/\delta ^{8})$ samples find policy within error notation="LaTeX">$\delta$ from global optimum. Moreover, by carefully decreasing iterations proceed, present improved ^{6})$ for

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite Sample Properties of Quantile Interrupted Time Series Analysis: A Simulation Study

Interrupted Time Series (ITS) analysis represents a powerful quasi-experime-ntal design in which a discontinuity is enforced at a specific intervention point in a time series, and separate regression functions are fitted before and after the intervention point. Segmented linear/quantile regression can be used in ITS designs to isolate intervention effects by estimating the sudden/level change (...

متن کامل

Time-Discontinuous Finite Element Analysis of Two-Dimensional Elastodynamic Problems using Complex Fourier Shape Functions

This paper reformulates a time-discontinuous finite element method (TD-FEM) based on a new class of shape functions, called complex Fourier hereafter, for solving two-dimensional elastodynamic problems. These shape functions, which are derived from their corresponding radial basis functions, have some advantages such as the satisfaction of exponential and trigonometric function fields in comple...

متن کامل

critical discourse analysis of two political speeches in light of bakhtins dialogism

چکیده گفتگوگری باختین نظریه ای است که به تفاوت ها احترام گذاشته و گفتگو را ارج می نهد. رشته های مختلف علوم انسانی همواره از این نظریه بهره جسته اند. با وجود این تنها مطالعات اندکی در زمینه تجزیه و تحلیل نقادانه کلام به نظریه گفتگوگری پرداخته اند. پنداشت تحقیق حاضر بر آن است که احترام به حقوق دیگران از طریق فراهم آوردن شرایط یکسان برای بیان نظریات مختلف یکی از مهمترین مشترکات تحلیل انتقادی گفتم...

15 صفحه اول

Finite-Sample Analysis of LSTD

In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-difference (LSTD) learning algorithm. We report a finite-sample analysis of LSTD. We first derive a bound on the performance of the LSTD solution evaluated at the states generated by the Markov chain and used by the algorithm...

متن کامل

Correction algorithm for finite sample statistics.

Assume in a sample of size M one finds M(i) representatives of species i with i = 1..N*. The normalized frequency pi* triple bond Mi/M, based on the finite sample, may deviate considerably from the true probabilities p(i). We propose a method to infer rank-ordered true probabilities r(i) from measured frequencies M(i). We show that the rank-ordered probabilities provide important informations o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Automatic Control

سال: 2023

ISSN: ['0018-9286', '1558-2523', '2334-3303']

DOI: https://doi.org/10.1109/tac.2022.3190032